KUO 
à 


= 


hal-00717572, version 1 - 13 Jul 2012 


a 9 


3 LaBRI, CNRS UMR 5800 
Laboratoire Bordelais de Recherche en Informatique 


Rapport de recherche RR-14XX-12 


Two-way automata and regular languages of overlapping tiles 


July 13, 2012 


Anne Dicky, David Janin, 
LaBRI, Université de Bordeaux 


hal-00717572, version 1 - 13 Jul 2012 


Abstract 


In this paper, we show how the study of two-way automata on words may 
relevantly be extended to the study of two-way automata on one-dimensional 
overlapping tiles that generalize finite words. Indeed, over tiles, languages 
recognizable by finite two-way automata (resp. multi-pebble automata) coin- 
cide with languages definable by Kleene’s (resp. Kleene’s extended) regular 
expressions. 

As an immediate corollary, if we restrict our observations to words, we 
obtain a new proof of Shepherdson’s theorem which posits that every finite 
state two-way automaton is equivalent to a finite state one-way automaton. 
We also obtain a new proof that this is still true for two-way automata with 
pebbles. 

Concerning tiles, we show that adding pebbles strictly increases the ex- 
pressive power of two way automata. The hierarchy induced by the number 
of allowed pebbles is however shown to collapse to level one. A single pebble 
is enough to reach maximal expressive power: the class of languages definable 
in monadic second order logic. 


1 Introduction 


1.1 Background 


In a seminal paper [12], Rabin and Scott defined finite state two-way automata 
on words. Then Shepherdson proved that the two-way automata have the same 
expressive power as one-way automata [13]. A few years later, two-way automata 
were extended with pebbles. Again, these automata have been proved to be no 
more expressive than one-way automata (see [3] for a state of the art and many 
more observations). 

Despite the negative results, two-way automata have been the subject of many 
studies. This can be partly explained by their intriguing combinatorial complexity. 
For instance, the underlying Shepherdson’s result has been considered difficult for 
many years [14]. More precisely, the capacity of two-way automata to read each 
letter an unbounded number of times makes the structure of two-way automata runs 
difficult to analyse. This is especially clear in Pécuchet and Birget’s algebraic studies 
of two way-automata |10, 1], in which two-way runs give rise to a rich combinatorial 
structure. A similar complexity is illustrated by Globerman and Harel’s result [3] 
that the number of allowed pebbles in two-way automata induces a “succintness” 
hierarchy: each additional pebble provides inherent exponential power. 
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Still, gaining a full understanding of two-way automata, with or without pebbles, 
remains a challenging topic, especially if we consider tree-walking automata: the 
extension of two-way automata to languages of trees. It has recently been shown 
in a number of studies (see [2] for an overview of these results) that, in the case of 
trees, each additional pebble provides extra expressive power. Yet, the decidability 
of this “Pebble” hierarchy is an open problem. 

The classical (one-way) finite automata theory have benefited from a rich alge- 
braic language theory that led, and still leads, to many decision algorithms [11]. 
But there is no algebraic characterization of two-way automata, neither for trees, 
nor even for finite words [1]. This suggests that the mathematical properties of 
two-wayness and of (more difficult) pebble-handling have not yet been completely 
understood even in the simplest case. 


1.2 Outline 


In this paper, we focus on two-way automata over words, but we define their se- 
mantics in terms of overlapping tiles [4] instead of words. Tiles that can be seen 
as domains of partial runs: runs that may start and stop anywhere on the input 
words. Then we show that the combination of successive partial runs corresponds to 
a sequential product of tiles, yielding a structure isomorphic to McAlister’s inverse 
monoid [8]. 

Embedding words into overlapping tiles enables us to apply the most classical 
techniques used over subsets of the free monoid in a straightforward way. As a 
result, we show that the regular languages of tiles (definable by Kleene expressions) 
are exactly the languages of tiles recognizable by finite-state two-way automata. We 
also prove that the - strictly larger - class of MSO-definable languages of tiles is the 
class of languages recognizable by finite-state many-pebble automata. Further even, 
one-pebble automata are shown to capture the whole class. Then, Shepherdson’s 
theorem and analogous results for pebble automata, are obtained as immediate 
corollaries. 

All these results support the long-standing intuition |10, 1, 6] that the theory of 
inverse monoids [7] is a powerful conceptual tool in the study of two-way automata. 
This is especially illustrated by the fact that all proofs presented here are simple. 


1.3 Preliminaries 


The free monoid. Given a finite alphabet A, A* denotes the free monoid generated 
by A, 1 denotes the neutral element. The concatenation of two words u and v is 
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denoted by uv. 

Prefix and suffix lattices. |<, stands for the prefix order over A*, <, for the 
suffix order. Vp (resp. Vs) denotes the joint operator for the prefix (resp. suffix) 
order: thus for all words u and v, u V, v (resp. u V, v) is the least word whose both 
u and v are prefixes (resp. suffixes). 

The extended monoid A* + {0} (with Ou = u0 = 0 for every word u), ordered by 
<p (extended with u <, 0 for every word u), is a lattice; in particular, u Vp v = 0 
whenever neither u is a prefix of v, nor v is a prefix of u. Symmetric properties hold 
in the suffix lattice. 

Syntactic inverses. Given A a disjoint copy of A, u +> U denotes the mapping 
from (A + A)* to itself inductively defined by T = 1, for every letter a € A, a is 
the copy of a in À and @ = a, and, for every word u € (A+ A)*, au = u.a. The 
mapping u + ņ is involutive (@ = u for every word u); it is an antimorphism of the 
free monoid (A + A)*, i.e. for all words u and v € (A + A)*, uv = vu. 

Free group. The free group FG(A) generated by A is the quotient of (A + A)* 
by the least congruence œ such that, for every letter a € À, aa ~ 1 and aa œ 1. Let 
< be the rewriting relation induced by the rules 1 < aa and 1 < Ga for every a € A. 
It is well-known that every class [u] € FG(A) contains a unique element red(u) (the 
reduced form of u) irreducible with respect to <, i.e. containing no factor of the 
form a.a or &.a. 

Free inverse monoid. The free inverse monoid FIM(A) generated by A is the 
quotient of (A + A)* by the Wagner congruence ~w, i.e. the least congruence such 
that utu ~w u and utvd ~w vouu for all u,v € (A+ A)*. 


2 The monoid of overlapping tiles 


Here we give a description of the monoid of overlapping tiles. It is shown to be 
isomorphic to McAlister’s monoid [8]. The tight link between (two-way linear) 
walks on words and tiles is formalized by means of an onto morphism which kernel 
over walks is proved to be Wagner congruence. 


2.1 Positive, negative and context tiles 


A tile over the alphabet A is a triple of words u = (u1, u2, uz) € A* x (A*+ A’) x A* 
such that, if us € À”, its inverse & is a suffix of u and a prefix of us. 

When us € A* we say that u is a positive tile. When us € A’ we say that u isa 
negative tile. When uz = 1, i.e. when u is both positive and negative, we say that 
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u is a context tile. Sets Ta, T4, T3 and C4 will respectively denote the set of tiles, 
the set of positive tiles, the set of negative tiles and the set of context tiles over À. 

The domain of a tile u = (u1, u2, u3) is the reduced form of u1uzu3 (always a 
word of A*); its root is the word up. 


A positive tile u = (u1, U2, u3) is conveniently drawn as a (linear, unidirectional 
and left to right) Munn’s birooted word tree [9]: 


ul i u2 u3 


where the dangling input arrow (marking the beginning of the root) appears on the 
left of the dangling output arrow (marking the end of the root). A negative tile of 
the form u = (u1u2, Uz, uzu3) € A* x A’ x A* is also drawn as a birooted word tree 


ui u2 t u3 


where the dangling input arrow appears on the right of the dangling output arrow. 
A context tile of the form u = (u1, 1,u3) € A* x 1 x A* is then drawn as follows: 


ul t u3 


a © 


2.2 The inverse monoid of tiles 


In this part, we abusively denote a word u of A*+ À" by any word of (A+A)* whose 
reduced form is u. 

Intuitively, the sequential product of two tiles is their superposition in such a 
way that the end of the root of the first tile coincides with the beginning of the root 
of the second tile; the superposition requires pattern-matching conditions to the left 
and to the right of the synchronization point. When both tiles are positive, this can 
be drawn as follows: 

ul | u2 j u3 


vi v2 v3 O 


The product can be extended to arbitrary tiles, as illustrated by the following figure 
(positive u and negative v): 
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Formally, we extend the set T4 with a zero tile to obtain T9 = T4 + {0}. The 
sequential product of two non-zero tiles u = (u1, u2, u3) and v = (v1, v2, V3) is defined 
as 

u.v = ((U1U2 Vs V1) U2, UgV2, Dofus Vp V2U3)) 


when both uju2 Vs vı # 0 and uz Vp vous Æ 0, and u.v = 0 otherwise. We let 
u.0 = 0.u = 0 for every u € TY. 


Remark. Let a, b, cand d € A be distinct letters. Then (a, b, c).(b, c, d) = (a, bc, d) 
whereas (a,b,c).(a,c,d) = 0. In the latter case, the left matching constraint is 
violated because a # b. 


Theorem 1 Set T? equipped with the sequential product of two tiles is an inverse 
monoid with neutral element 1 = (1,1,1) and (pseudo) inverses given by 0! = 0 
and for every non zero tile u = (uy, U2, U3) € Ta, wu? = (wuz, T2, uous). 


Proof. Since 1 is obviously the neutral element and 0 the absorbant element, we 
have to prove that the sequential product is well-defined (or sound) and associative. 

Soundness. We first prove that the sequential product of two tiles is a tile. Let 
u = (u1, Ug, U3) and v = (v1, V2, v3) be tiles such that u.v Æ 0. In all case, with u or 
v positive or negative, one can check that (uju2 Vs v1).U2 € A*, Tous Vs Vous) E A*. 
Moreover, when uv € A’ (with elements of the free group FG(A) always reduced), 
we also have Uz03 = Uz. <s (U1U2 Vs V1).U2 and GoU2 = V2.-U2 <y Va(us Vs vous). We 
conclude thus that u.v = ((u1u2 Vs V1) Ua, ua, Dofus Vp V2v3)) is indeed a tile. 

Associativity. We observe first that the pattern-matching conditions are asso- 
ciative: if u = (u1, U2, U3), V = (V1, V2, V3) and w = (wy, W2, w3) are non-zero tiles, the 
products (u.v).w and u.(v.w) are non-zero tiles if and only if ujugv2 Vs vivo Vs W1 Æ 
0 and uz Vs vous Vs vowows # 0. In that case, (u.v).w = ((uiugde Vs vivo Vs 
W}) Ug Ug, UQUYW2, We Va(us Vs V2U3 Vs VeW2W3)) Which just equals u.(v.w). 


To prove that T? is an inverse monoid (proving that every element as a unique 
pseudo inverse), it suffices to prove [7] that every element of T4 has a (pseudo) 
inverse and that idempotents commutes. 

Since 0 is obviously a pseudo inverse to itself and commutes with every elements 
we only consider the case of non zero tiles. 

Existence of pseudo inverses. For every u = (ui, uo,u3) € Ta, given ut = 
(ut, Uz, U2u3), we do have u.u™t.u = u and u l.u.u ! = u. Indeed, by symmetry, 
since (u~')~! = u it suffices to prove that u.u~'.w = u. But one can easily check 
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that u.u ! = (u1, 1, uzu3) (equivalently u lu = (uuz, 1, u3)) and thus uu lu = 
(u1, U2, U3). 

Commutation of idempotents. Since context tiles C4 form a commutative 
submonoid of T4 (this immediately follows from the commutativity of V, and Vp) 
it suffices to prove that the idempotents in T4 are the context tiles. 

Let u € C4. By definition u = (u1, 1, u3) hence u.u = ((u1.1 Vsui).1,1.1,1.(u3V, 
1.u3)) and thus u.u = u. Conversely, let u = (u1, u2, u3) € Ta. Assume that u.u = u. 
By product definition, this means in particular uzu = ug hence uz = 1 and thus 
u € CA. 

An immediate property worth being mentioned: 


Lemma 2 The mapping u + (1,u,1) from A* to T4 is a one-to-one morphism. 


In other words, the free monoid A* can be seen as a submonoid of T9. In the 
remainder of the text we may use the same notation for words of A* and their 
images in T$. 


Remark. In [8], Lawson already provides a description of overlapping tiles. Non 
zero tiles are modeled as triples of the form (u1, us, u3) € A* x A* x A* with uy <p U2 
and u3 <, u2. One can show that the mapping (u1, U2, u3) > (u1, red(ür.u2.ü3), U3) 
that maps Lawson’s models to the model presented here is, extended to 0, a monoid 
isomorphism. In other words, from a mathematical point of view, this hardly makes 
any differences. However, one may think, as the present authors do, that the model 
proposed in this paper induces a simpler product definition. 


2.3 Linear walks and the McAslister monoid 


We provide here a proof that T4 is isomorphic to McAlister monoid. Again, this 
follows from Lawson’s characterization of McAlister monoid [8] that can be shown 
isomorphic to ours. However, the proof given here, via the monoid W of linear 
walks, conveys most of the intuition about the link with two-way automata. 


Informally, a walk over a word u € A* is a word of (A + A)* corresponding to 
a back and forth reading of u (left to right reading is modeled by letters of A, and 
right to left reading by letters of A). Not all words of (A+ A)* are walks. Obviously, 
no factor ab or ab with distinct letters a and b can occur in a walk. But things are 
a little more complex. A word like baac with distinct a, b and c is still not a walk. 
It occurs that walks are easily defined by contraposition. 

Formally, let L be the set of all words v € (A+A)* such that there exists a word 
u < v and some distinct letters a and b such that either ab or ab is a factor of u. 
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A (linear) walk is any word u € (A + A)* such that u ¢ L. Clearly, L is closed by 
product with arbitrary elements of (A + A)*. It is thus an ideal of (A + A)*. 

We define the monoid of walks WY as the Rees quotient (A + A)*/1. It is 
the monoid obtained from (A + A)* by merging all elements of | into a zero, the 
undefined walk. The set of non zero walks is denoted by W4. By definition, W4 = 
(A + A)* — L. It shall be clear that L (henceforth W4) is closed under Wagner 
congruence. 

Walks and tiles are related by the following Lemma: 


Lemma 3 For every non zero walk w € Wa, there is a unique tile 0(w) = (u1, U2, u3) € 
TA such that tu u2u3t3 ~w w. In particular, for every walks w and w2, wi ~w w2 
if and only if 0(w1) = 0(w2). 

Moreover, extending 0 to a mapping from W? to T? by taking O(L) = 0, 0 is an 
onto monoid morphism, i.e. in particular, for every defined walks w and w' € Wa 
such that w.w’ € Wa, 0(w).0(w') = O(w.w’). 


Proof. Existence. Let w € W4. The existence of some 6(w) as above is proved 
by induction over the number n(w) of turns (or alternation of positive and negative 
letters) in w. Observe that as soon as we prove the existence of 6(w) we can take 
6(w) = (@(w))~*. It follows that we only need to prove the existence of either some 
O(w) or some 0(&). 

When n(w) = 0, if w € A* we take 6(w) = (1,w,1). By symmetry, if w € A’ we 
take O(w) = (w, W, w). 

Assume n(w) > 0. We have w = rw’ for some x € A*+ A and w’ € W4 and 
n(w’) = n(w)—1. By induction hypothesis, 0(w’) exists with 6(w’) = (u1, u2, u3) and 
ww tuuouzuz and thus, since Wagner is a congruence, w © 7% U,U2U3U3. Now, 
since W4 is closed under Wagner congruence, this means that tüuiuouzu3 € Wa 
hence, in particular, cu, € W4. 

By symmetry, possibly taking W instead of w, we may assume that x € A*. 
Since zu, € WA this means that x Vs u, # 0. It follows that (1,2,1).0(w’) = 
((x Vs u1)T, tu, u3) Æ 0 and, in both cases uy <s x or £ <, u, we take O(w) = 
(1, x, 1).0(w’). 

Indeed, in the first case, uj = ux for some u| € A*. We have (1,2, 1).0(w’) = 
(u1, u2, uz) and we know that w ~w 2(uixv)(ul.x)ugugtiz hence w ~w TTU, U) ruousus 
hence w ~w UuxFxru2u3u3 hence w ~w Tui rugus). 

In the second case, x = x'u1 for some z’ € A*. We have (1,x,1).0(w') = 
(1, zu2,U3) and w ~w (x'ui)üiuiuauzu3 hence w ~w x'uyuauzu3 hence w ~w 
LUQUZU3. 
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The fact that, in both case, 0(w) is a well-defined tiles (which can also be proved 
following a tedious case studies) just follows from the soundness of our tile product. 

Unicity. Let w € Wa and two decomposition w ~w u = Uyuyugu3t3 and 
w ~w u' = Tuusust;. Since Wagner congruence is compatible with the free group 
reduction this implied that red(u) = red(u’) hence uz = us. 

Let then a be an arbitrary letter in A. On the left side of w, since auu € W4, we 
also have auu’ € W4. Hence au, V,u, # 0. But since this holds for arbitrary a € A 
(assumed to have two distinct letters) this implies that uj} <s ui. By symmetry we 
also have u1 <, uj hence u; = u|. An analogous argument holds for the right side 
proving thus that u3 = u3. 

Compositionality. The fact that 0 extended to 0 is a monoid morphism im- 
mediately follows from the proof of its existence. Moreover, for every non zero tile 
(u1, u2, U3) € T4 one has O(Hiuiuauaü3) = (u1, U2, U3) hence @ is onto. 


Remark. In some sense, this Lemma captures most of the combinatorial analy- 
sis of two-way automata runs made in [10, 1]. However, being only concerned by 
what do read two-way automata rather than how do they perform readings, Wagner 
equivalence appears as the appropriate simplification concept. 

Previous lemma says that, in W4, Wagner congruence is the kernel of 0. Now, 
since L is closed under the Wagner congruence, it can also be seen as an ideal of the 
free inverse monoid F1M(A) and then FIM(A)/L just corresponds to McAlister 
original definition of his monoid. 


Corollary 4 The monoid of tiles T}, isomorphic to the quotient of walks W3/ ~w 
by Wagner congruence, is also isomorphic to McAlister monoid FIM(A)/L. 


The relationship between walks, tiles and elements of the free inverse monoid is 
summarized by the following commuting diagram. 


(A+ a FIM(A) 


fall [+ 
we —? 79 
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3 Classes of definable languages of tiles 


We define here various notions of regular languages of tiles (or walks). Following a 
classical habit in formal language theory, for any s some monoid M, we denote by 
s, depending on the context of use, both the element s or the singleton {s}. 


3.1 Regular and k-regular languages 


As for languages of words, the following operators are defined over languages of tiles. 
For all M, N C T4, let 


e Sum M+N={uEe Ta:ueMVuEe N} 
e Product: M.N = {uveTaiueM,ve N} 


e Star: M* = Enen M” with M? = {(1,1,1)} and M+! = M.MF for every 
kEN. 


On purpose, we restrict to non zero tiles. Still one can check that most classical 
properties are satisfied. Sum and product of languages are associative and product 
distributes over sum, i.e. for every language L, M and N C Ty, L.(M + N) = 
L.M + L.N and (M +N).L = M.L + N.L. Moreover, for all languages M and N C 
T1, M*.N is the least solution with respect to inclusion of the language equation 
X = M.X +L. 


Definition. A language of tiles M C TA is regular when it can be defined as 
the result of finitely many additions, multiplications and star operations of finite 
languages of non-zero tiles. The class of regular languages of tiles is denoted by 
REG. 

Are there other operators on languages worth being defined ? An obvious one is 
the inverse operator. For every M C T4 let Ml = {u7! € Ta: u € M}. However, 
the fallowing fact is well-known in the theory of inverse monoids: 


Lemma 5 The class REG of regular tile languages is closed under the inverse op- 
eration. 


Proof. Just observe that for all L,M € Ty, we have (L+ M) = L + Mt, 
(EM aM RE sad = (L71)*. 
This suggests that we need more. Classical and inverse operators on languages 
are completed by the following unary operator. For every M C T4, the context 
projection M° of M is defined by M? = M N C4. 
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Definition. For every k € N, a language M C T4 is called k-regular when 
either k = 0 and M is regular or k = k' + 1 and M can be defined as the result 
of finitely many additions, multiplications, star and inverse operations of k/-regular 
languages N and context projection N° of k’-regular languages. The class of k- 
regular languages is denoted by k-REG. 

As an immediate consequence of the definition: 


Corollary 6 REG C 1-REG C 2-REG C 8-REG C k-REG C... 


3.2 Languages definable in MSO 


Last, the bi-rooted presentation of every non zero tile can also be seen as a FO- 
structure. It follows that one also define the class MSO of languages definable by 
means of Monadic Second Order formula. The following two Theorems are proved 
in [4]. 


Theorem 7 (Robustness [4]) For all M and N C Ty, if both M and N are 
MSO-definable then so are M + N, M.N, M*, M™ and M°. 
Corollary 8 For every k € N, k-REG C MSO. 


Theorem 9 (Simplicity [4]) A language M C Ty is MSO-definable if and only 
there are finitely many triples of regular languages of words (Li, Ci, Ri)ierus € 
P(A*) x P(A*) x P(A*) such that M = Dier(L; x Ci x Ri)” with either e; = 1 


ore; = —1. 


Corollary 10 The class 1-REG and MSO are equal. 


Proof. For every triples of languages of words L, C and R C A*, embedding them 
in T4, one has (L x C x R) = (L71.L)°.C.(R.R71)°. It follows that, provided L, C 
and R are regular, their embeddings in T4 are also regular (or 0-regular) and thus 
both (L x C x R) and (L x C x R)~' are 1-regular. Then, by Theorem 9, this proves 
that MSO C 1-REG. 


Remark. To complete the picture, the class REC of languages of tiles recognizable 
by means of finite monoid can also be defined. This class is studied in [4]. It is shown 
to be strongly related with bi-infinite periodic words. As a consequence, it can be 
shown that REC is strictly included REG. 
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4 Two-way automata and regular languages 


We prove here that regular languages of tiles are just the languages recognizable by 
finite two-way automata. 


4.1 ‘Two-way automata 


A finite-state two-way automaton (or 2WA for short) on an alphabet A is a quadruple 
A = (Q,1, F, A) with a finite set of states Q, a set of initial states J C Q, a set of 
final states F C Q, and a transition table A : (A + A) > P(Q x Q). 

A run of A over a string (of (A+ A)* is a finite sequence p = qoa1Q1 -- - Gn—14nIn 
where n > 0, q0,...,4n € Q and a1,...,an € À + À, such that for every 1 < i < n, 
(qi-1, gi) € A(ai). 

The (string) run p is accepting if qo € I and qn € F. In that case, we say that the 
associated string Sp = a1-:-a, € A* is accepted by the automaton A. The string 
language of strings accepted by A is written S(A). In that case, automaton A is 
just seen as a classical one-way automaton on the (uninterpreted) alphabet A + A. 

We say that run p is a run over a walk when, moreover, s, is a walk. In that 
case, we write w, for the string s,. The set of walks accepted by A is written W (A). 
In other words: W(A) = S(A) A Wa. 


Remark. ‘Two-way automata may be defined with the possibility to stand still at 
the same position while changing state. Clearly, these “silent” transitions are just 
syntactic sugar that can be replaced with extra non-silent transitions (by the same 
techniques used to eliminate classical silent transitions in one-way automata). Our 
definition follows [3]. 

The following lemma emphasizes the difference between two-way automata (in- 
terpreted on walks) and ordinary finite-state automata (interpreted on strings): 


Lemma 11 There exists a two-way automaton A such that W(A) is not a regular 
language. 


Proof. Assume A = {a,b} with two distinct letters, let Q = {q0, q1, q2, 93} be a four- 
state set, and let A = {Q, {qo}, {a3}, A} with A(a) = {(40,%)}, A(b) = (la. u)}, 
A(b) = {(q1, 92); (G2, G2) } and A(G) = {q2, qa}. 

S(A) = abtb'a. It follows that W(A) = {ab"b'a : n > 0}, hence W(A) is non 
regular. 
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4.2 Word and tile languages recognized by 2WA 


The language of words L(A) recognized by a two-way automaton A on alphabet A 
(completed by A) is the set of words u € A* such that there is an accepting run 
of À corresponding to a back-and-forth reading of u, i.e. a walk w € A such that 
w ~w u. In other words: L(A) = {u € A*: (1,u,1) E T(A)}. 


In a similar way, we define the language of tiles T(A) recognized by a two- 
way automaton A as the set of tiles u = (u1, u2, u3) € TA such that there exists 
a walk w € W(A) such that w ~w Wujugu3tz, ie. w is a walk over ujUgus, 
starting at the entry point of the tile u and ending at its exit point. In other words: 


T(A) = 8(W(A)). 
We now prove a Kleene theorem for tile languages. 


Theorem 12 The regular tile languages are exactly the tile languages recognizable 
by finite-state two-way automata. 


Proof. This follows from Lemmas 15 and 13 below. 


Lemma 13 Every regular language of tiles M € TA is recognizable by a finite state 
two-way automaton on the alphabet A. 


Proof. For any finite tile language M € T4, any one-way automaton recognizing the 
finite word language {Giuiuousüz : (u1, U2, U3) € M} can be viewed as a two-way 
automaton recognizing M. Thus all finite languages of non-zero tiles are recogniz- 
able. 

More generally, any finite-state one-way automaton A over the alphabet A+ A, 
recognizing a word language L C (A+ A)*, can be viewed as a 2WA recognizing the 
tile language 0(LM W4). 

For all word languages L, M € (A + A)* we have 


e O(L+M)NW41) = O(LN Wa) +0(MNWA) (obvious) 
e OLMNWA) = 0(L0 Wa).0(M N Wa) (from Lemma 3) 
e O(L*NWa,) = (0(\LNW4,))* (consequence of definition and both cases above). 


We conclude the proof by induction on the structure of regular expressions, using 
the classical one-way automata theory. 

Let A = (Q,I, F,A) be a two-way automaton on the alphabet A. For every 
pair of states (p,q) € Q x Q, let Tp, denote the language of tiles recognized by the 
two-way automaton A, = (Q, {q}, {p}, A). 
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Lemma 14 The sets T, with p and q € Q are the least solution (w.r.t. inclusion 
order) of the system of equations 


Ta = Op + by D O(a) Tra (Epa) 
acA+A (p;r)EA(a) 


where 05,4 = {1} if p =q and Ÿ otherwise. 


Proof. For every state p and q, and every integer k > 0, let We, denote the set of 
walks of length at most k accepted by Ap . The identities WT, = 6,4 and 


Woe = Wra + > > (a.Wrg Wa) 
ac A+A (p;r)EA(a) 


immediately follow from the definition of the walk acceptance. Then, with Wp, = 
ren WE by Tarski’s fixpoint theorem, the sets Wp, form the least fixed point of 
the system of equations: 


Wa = p,q + 2 > (aW,43N WA) (Era) 
acA+A (p;r)EA(a) 


We conclude the proof by taking the images by 0 of all equations. 


Lemma 15 T(A) is a regular tile language. 


Proof. The least solution of the system of equations Ep, can be computed by 
a Gaussian elimination of variables, applying the fact that the least solution of an 
equation of the form X = U.X +V in P(T4) is U*.V. This gives a regular expression 
of every Tp and thus for T(A) as well since T(A) = Xip, qerxr pq: 


Corollary 16 (Shepherdson) Every language of words recognizable by a two-way 
automata is regular. 


Proof. Let A be a finite two-way automata. The language L(A) € A* of words 
recognized by A is defined to be L(A) = {u € A*: (1,u,1) € T(A) : u = ug = 1}. 
Now let # be a new letter. By Lemma 15, language #.T(A).# is regular. Moreover, 
since # € A we also have #.T(A).# C {1} x #A*# x {1} and thus #.T(A).# = 
#.L(A).#. Then we conclude by applying Theorem 9. 
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5 Pebble automata and k-regular languages 


In this section, we prove that k-regular languages are captured by pebble automata. 
The number of pebbles is not explicitly encoded into our definition of pebble au- 
tomata (like invisible pebbles in [2]). It follows that k-pebble automata are (equiv- 
alently) defined, in our approach, as (invisible) pebble automata restricted to runs 
using at most k pebbles. 


5.1 Multi-pebble automata 


A finite-state pebble automaton (or PWA for short) on an alphabet A is a quadruple 
A = (Q,1,F, A) with a finite set of states Q, a set of initial states J C Q, a set of 
final states F C Q, and a transition table A : (A+ A+ {1*,17}) > P(Q x Q). 
For every a € A+ A, A(a) tells how a can be read as in a two-way automaton. 
Newly, A(1*) tells how a pebble can be left on the current position and A(17) tells 
how a pebble can be removed. In other words, a pebble automaton is a two-way 
automaton that has the capacity, from time to time, to leave and remove pebbles 
placed between letters of the underlying word. 

Of course, a pebble cannot be removed if it has not been left before. Moreover, 
the behaviour of pebble automata must be restricted so that only the last pebble 
put on a word may be removed (in a LIFO style). Otherwise, pebble automata can 
define languages of accepting runs of Turing machines. 

These two restrictions are conveniently captured by the following definition of 
pebble automata runs. 


A position configuration is a non-empty finite sequence of (positive or negative) 
integers p = ng.--: ng € Zt. The intended meaning of position configuration p is 
that n; records the relative number of letters (positive or negative) read from the 
ith pebble left on the input word, with the initial starting point modeled as a sort 
of a Oth pebble. 

Moreover, since any non-zero pebble is eventually removed in a run, we only 
record the position relative to the last pebble left. Pushing a pebble on the stack 
freezes the previous recorded relative positions. It follows that, at any time, n; + 
Nit +---+nx will denote the number of positive (or negative) letters that separate 
the automaton head from the position of the ith pebble. In particular, when n; = 0, 
the kth pebble can be removed. 


Formally, a run of the pebble automaton A is a finite word p € ((Q x Z*).(A + 
A+1))*.(Q x Z*) such that, for every factor of p of the form (q, p).a.(q', p"), with 
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a € A + À +1, one of the following conditions is satisfied: 


e (q,q) € A(a), a € A+ À and p' = p.ô,, with 6, = 1 if a € A and 6, = —1 if 


a € À, 
e (g,g) € AT), a = 1 and p' = p.0, 
e (q,q') € A(17), a = 1 and p = p'0, 


As before, we also assume that the projection w, of p to (A+ A)* is a non-zero walk. 

We observe that the position configurations are handled as a (left to right) stack. 
Leaving a pebble amounts to pushing 0, the new relative position of the head from 
that pebble. Reading a € À + A, the relative position from the last left pebble is 
changed by ôa. Removing a pebble amounts to popping the last relative position of 
the head from that pebble. This relative position is forced to 0. This way, we model 
the fact that the head must have moved back on the position where the pebble has 
been left. 


The number of pebbles used in a run p € ((Q x Zt).(A+A))*.(Q x Z*) is defined 
to be the least integer k € N such that p € ((Q x ZS**1).(A + A))*.(Q x ZS**1) 
where Z<* stands for the non sequences of integers of length at most k. 

Still writing w; for the projection of run p on the alphabet A + A, we say that 
a triple u = (u1, U2, u3) is accepted by automaton A with at most k-pebble when 
there exists run p using at most k pebbles such that wp ~w UWruiu2u3t3 with start 
state of the form (q,0) with q € J and end state of the form (q’,i) with qd € F. A 
simple check of our definition shows that, in that case i = |u| when ug € A* and 
i = —|ua| when uz € A. 


5.2 Pebbles vs tiles context operators 


From now on, a k-pebbles automaton is defined as a many-pebble automaton which 
runs are only allowed to use at most k pebbles. 


Theorem 17 For every k € N, the k-regular tile languages are exactly the tile 
languages recognizable by finite-state k-pebbles automata. 


Proof. Follows from the Lemmas 18 and 19 bellow. 


Lemma 18 Every language of tiles definable by a finite k-pebbles automaton is k- 
regular. 
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Proof. Let A = (Q,I, F,6) be a finite many-pebble automaton. For every pair of 
states (p,q) € Q x Q, every integer k > 0, let TR C TA be the language of tiles 
recognized by automaton A with at most k-pebbles from state p to state q. Let also 
Cy, be the associated set of context tiles defined by Ch, = T$, N Ca = (TRDO. 

First, one can prove as in the previous section that nets ive triples T > j foni the 
least solution of the set of equations defined, for every p and q € Q, by 77 A = E, 
as before and, for every k > 0, by: 


TE = Gog >. tae 
acA+A (pr)EA(a) 


+ > D, Grel (A) 


(p,p')eë(1+) (r'r)eë(1—) 


Indeed, we just mimic in these equation all the possible cases to build a run. Either 
somme letter a € A+ A is red, or a pebble is used. Of course, we check that a q only 
depends on languages of the form T y with k’ < k, or Ck f= (Ee)? with k= he 
That later case shows in particular that no circular dapendeney Avolies projections 
on context tiles. It follows that this system can be solved by induction on k € N 
by Gaussian elimination of variables. Then, for every k € N, given the k-regular 
expressions defining languages T35 we conclude by taking T*(A) = Lpoerxr T q 
Oo 


Lemma 19 Every k-regular language of tiles is definable by a finite k-pebble au- 
tomaton. 


Proof. As for regular languages of tiles, we proceed by induction on the syntactic 
complexity of k-regular expressions building and combining Multi-pebbles automata. 
We just detail the construction for the context operators. Given a finite automaton 

= (Q,1,F,A) k-recognizing language T = T*(A) we define automaton A’ = 
(Q', 1", F’, À) by taking Q’ = QU {qo, qf} with qo and qr two new states, I’ = {qo}, 

= {qr} and, for every a € A+ A, A' (a) = A(a), A’(1*) = A(1*) U ({qo} x I) and 
A'(17)= A(17) U (F x {q;}). One can easily check that automaton A’ recognizes 
with at most k+1 pebbles language T° = T*(A)NC4, ie. TAN = T°. The fact 
k-regular languages only need k-pebbles immediately follows from that construction. 
Oo 


Corollary 20 Every language of words recognizable by a k-pebble automata is reg- 
ular. 
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Proof. The proof goes just has for the above proof of Shepherdson theorem. Given 
a finite k-pebble automata A. The language L(A) C A* of words recognized by A 
is defined to be L(A) = {u € A* : (1,u,1) € T*(A) : uy = u3 = 1}. Again, given 
# ¢ A, by Lemma 18, language #.L(A).# = #.T*(A).# C {1} x #A*# x {1} is 


regular and we conclude by applying Theorem 9. 


Conclusion 


We have shown how McAlister monoid is the adequate domain of interpretation of 
two-way or many pebble automata on words. As trees can also be embedded in 
FIM(A), this strongly suggests that some adequate Ree’s quotient (by the ideal 
on non tree-shaped bi-rooted trees) of the free inverse monoid could play the same 
role for tree-walking automata. Such a study is currently under development in 
connection with the notion of quasi-recognizability defined by the second author [5]. 
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